• 1 Introduction
  • 2 Example data
  • 3 Running the analysis
    • 3.1 Initialization
      • 3.1.1 Load packages
      • 3.1.2 Set directories
      • 3.1.3 Load global parameters
      • 3.1.4 Set directories from parameters
      • 3.1.5 Targets preparation
    • 3.2 Block 01-Load-QC-Norm-Filt
      • 3.2.1 Load the parameters for this block
      • 3.2.2 Load the targets file
      • 3.2.3 Load raw data and create ExpressionSet object
      • 3.2.4 Quality control (QC) of raw data
      • 3.2.5 QC of raw data in one step
      • 3.2.6 Normalization
      • 3.2.7 Save annotations of summarized probes
      • 3.2.8 Quality control of normalized data
      • 3.2.9 QC of normalized data in one step
      • 3.2.10 Sample exclusion
      • 3.2.11 Filtering
      • 3.2.12 Save data from this block (normData.Rda object)
    • 3.3 Block 02-Differential Expression Analysis (DEA)
      • 3.3.1 Background
      • 3.3.2 Load the parameters for this block and set directories
      • 3.3.3 Data for DEA
      • 3.3.4 Building the linear model
      • 3.3.5 Compute the contrasts and moderated t-tests
      • 3.3.6 TopTables
      • 3.3.7 Results summary
      • 3.3.8 Volcano plots
      • 3.3.9 All DEA in one function
    • 3.4 Block 03 Analysis of Multiple Comparisons
      • 3.4.1 Background
      • 3.4.2 Load the parameters for MC analysis and set directories
      • 3.4.3 Data for MC
      • 3.4.4 MC Venn Upset
      • 3.4.5 MC Heatmap plots
    • 3.5 Block 04 A-Analysis of Biological Significance (ABS-GSEA)
      • 3.5.1 Background
      • 3.5.2 Load the parameters for ABS-GSEA and set directories
      • 3.5.3 Data for ABS-GSEA
      • 3.5.4 GO unfilt GSEA
      • 3.5.5 GO filt GSEA plots
      • 3.5.6 Reactome pathways unfilt GSEA
      • 3.5.7 Reactome filt GSEA plots
    • 3.6 Block 04 B-Analysis of Biological Significance (ABS-ORA)
      • 3.6.1 Background
      • 3.6.2 Load the parameters for ABS-ORA and set directories
      • 3.6.3 Data for ABS-ORA
      • 3.6.4 GO unfilt ORA
      • 3.6.5 GO filt ORA plots
      • 3.6.6 Reactome unfilt ORA
      • 3.6.7 Reactome filt ORA plots
  • 4 References

1 Introduction

The maUEB package contains functions to analyze data from microarray experiments. This vignette shows an example of use of maUEB package to run a complete microarray analysis. The workflow includes the following steps:

  1. Initialization
    • set up project directories
    • load required packages
    • set up the global parameters of the study
    • prepare the targets dataframe
  2. Data for the analysis
    • Sample data: load the targets dataframe
    • Microarray data: read or load raw data
  3. Data preprocessing and Quality Control
    • Quality control of raw data
    • Background substraction, normalization and summarization
    • Quality Control of normalized data and sources of variability
    • Filtering
  4. Differential expression analysis
    • Top tables
    • Volcano plots
  5. Multiple comparisons analysis
    • Venn diagrams
    • Expression profiles
  6. Analysis of Biological Significance
    • GO analysis
    • Reactome analysis

2 Example data

The example is based on a study deposited in the Gene Expression Omnibus (GEO) and published in ref. The study is based on 12 samples hybridized in Clariom S Mouse Arrays from Thermofisher. The experiment condition considered in this study is:

  • Treatment:
    • 4 samples from untreated mice (CTL)
    • 4 samples from mice treated with PD1 (PD1)
    • 4 samples from mice treated with CMP (CMP)

For illustration purposes, an imaginary Batch factor will be added to the sample data.

The following group of comparisons will be performed to determine the effects of each treatment on gene expression:

  • Effect of Treatment:
    • PD1vsCTL = PD1 - CTL
    • CMPvsCTL = CMP - CTL
    • PD1vsCMP = PD1 - CMP

3 Running the analysis

3.1 Initialization

3.1.1 Load packages

if (!require(maUEB)){
    require(devtools)
    install_github("uebvhir/maUEB", build_vignettes = FALSE)
}
## Loading required package: maUEB
## 
## Registered S3 method overwritten by 'enrichplot':
##   method               from
##   fortify.enrichResult DOSE
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## 
## Registered S3 method overwritten by 'seriation':
##   method         from 
##   reorder.hclust gclus
## Registered S3 methods overwritten by 'ff':
##   method             from
##   maxindex.default   bit 
##   poslength.default  bit 
##   maxindex.bitwhich  bit 
##   maxindex.bit       bit 
##   poslength.bitwhich bit 
##   poslength.bit      bit
## Warning: replacing previous import 'limma::backgroundCorrect' by
## 'oligo::backgroundCorrect' when loading 'maUEB'
## Warning: replacing previous import 'heatmaply::normalize' by 'oligo::normalize'
## when loading 'maUEB'
## Warning: replacing previous import 'AnnotationDbi::select' by 'plotly::select'
## when loading 'maUEB'
## Warning: replacing previous import 'ggplot2::last_plot' by 'plotly::last_plot'
## when loading 'maUEB'
## Warning: replacing previous import 'AnnotationDbi::tail' by 'utils::tail' when
## loading 'maUEB'
## Warning: replacing previous import 'AnnotationDbi::head' by 'utils::head' when
## loading 'maUEB'
## Warning: replacing previous import 'ReportingTools::page' by 'utils::page' when
## loading 'maUEB'

3.1.2 Set directories

# Set working directory (it paths to vignette directory)
workingDir  <- here::here("vignettes")
# Parameters directory
paramDir <- file.path(workingDir, "parameters")

3.1.3 Load global parameters

source(file.path(paramDir, "global_parameters.par"))

For this example study, global parameters were set as follows:

(global_parameters <- read.table(file.path(paramDir, "global_parameters.par"), header = FALSE, sep=";", as.is=TRUE))
##                                                            V1
## 1                                          dataDirN <- dades 
## 2                                       celDirN <-  celfiles 
## 3                                     resultsDirN <- results 
## 4                                           client <- RSRCHR 
## 5                                                ID <- STUDY 
## 6 descriptors <- c(FileName,Group, ShortName, Colors, Batch) 
## 7           annotPackage <- clariomsmousetranscriptcluster.db
## 8                                         organismsp <- mouse

3.1.4 Set directories from parameters

dataDir <- file.path(workingDir, dataDirN)
celDir <-  file.path(workingDir, celDirN)
resultsDir <- file.path(workingDir, resultsDirN)

3.1.5 Targets preparation

The build_targets() function can be used to build a targets template dataframe that will then be manually filled. The targets template built will contain the filename of the arrays in rows and the variables of interest in columns. First column (FileName) contains the names of the cel files contained in the celfiles directory specified in parameters. The remaining columns correspond to the descriptors specified in global parameters, among which there will be the following: Group, ShortName, and Colors. Other variables may be Batch, PatientID, or other variables of interest that may be added by the user. The targets template built is saved as a semicolon-separated file named targets.RSRCHR.STUDY.template.csv, where RSRCHR refers to client acronym and STUDY refers to the study UEB id.

An example of the targets generated for this study is shown below:

build_targets(inputDir=celDir, outputDir=dataDir, client=client, ID=ID, descriptors=descriptors)
(targets_template <- read.table(file.path(dataDir, paste0("targets.", client, ".", ID, ".template.csv")), header = TRUE, sep = ";", as.is=TRUE))## Compte! la doble cometa l'interpreta com a accent greu si no es canvia a R) 
##                    FileName Group ShortName Colors Batch
## 1  GSM3730337_A1-LT_1Wk.CEL     0         0      0     0
## 2  GSM3730338_A2-LT_1Wk.CEL     0         0      0     0
## 3  GSM3730339_A3-LT_1Wk.CEL     0         0      0     0
## 4  GSM3730340_A4-LT_1Wk.CEL     0         0      0     0
## 5  GSM3730341_B1-LT_1Wk.CEL     0         0      0     0
## 6  GSM3730342_B2-LT_1Wk.CEL     0         0      0     0
## 7  GSM3730343_B3-LT_1Wk.CEL     0         0      0     0
## 8  GSM3730344_B4-LT_1Wk.CEL     0         0      0     0
## 9  GSM3730345_E1-LT_1Wk.CEL     0         0      0     0
## 10 GSM3730346_E2-LT_1Wk.CEL     0         0      0     0
## 11 GSM3730347_E4-LT_1Wk.CEL     0         0      0     0
## 12 GSM3730348_E5-LT_1Wk.CEL     0         0      0     0

Then, the information for each descriptor needs to be filled manually in excel/libreoffice and saved as csv file targets.RSRCHR.STUDY.csv using ; as field separator.

3.2 Block 01-Load-QC-Norm-Filt

3.2.1 Load the parameters for this block

#Parameters
source(file.path(paramDir, "01-Load-QC-Norm-Filt.par"))
#Execution parameters
source(file.path(paramDir, "01-Load-QC-Norm-Filt_analysistodo.par"))

For this example study, parameters were set as follows:

(loadQCNormfilt_parameters <- read.table(file.path(paramDir, "01-Load-QC-Norm-Filt.par"), header = FALSE, sep=";", as.is=TRUE))
##                                                                                                                                                                                                                       V1
## 1                                                                                                                                                                                                     dataDirN <- dades 
## 2                                                                                                                                                                                                  celDirN <-  celfiles 
## 3                                                                                                                                                                                                resultsDirN <- results 
## 4                                                                                                                                                                 resultsSummFN <- ResultsSummary_Load-QC-Norm-Filt.txt 
## 5                                                                                                                                                                                 targetsFN <- targets.RSRCHR.STUDY.csv 
## 6                                                                                                                                                                                       targets.fact <- c(Group, Batch) 
## 7                                                                                                                                                                                                 batchcolName <- Batch 
## 8  colorlist <- c(firebrick1, blue, darkgoldenrod1, darkorchid1, lightblue, orange4, seagreen, aquamarine3, red4, chartreuse2, plum, darkcyan, darkolivegreen3, forestgreen, gold,khaki1, lightgreen, gray67, deeppink) 
## 9                                                                                                                                                                      annotPackage <- clariomsmousetranscriptcluster.db
## 10                                                                                                                                                                                              samplestoremove <- CMP.2
## 11                                                                                                                                                                                       normalize.method <- oligo::rma 
## 12                                                                                                                                                                                               filterByVar.fun <- IQR 
## 13                                                                                                                                                                                              filterByVar.thr <- 0.65 
## 14                                                                                                                                                                                      pcaRawData.fact <- targets.fact 
## 15                                                                                                                                                                                            pcaRawData.scale <- FALSE 
## 16                                                                                                                                                                                     pcaNormData.fact <- targets.fact 
## 17                                                                                                                                                                                          pcaNormData.scale <- FALSE  
## 18                                                                                                                                                                                                  pct_threshold <- 0.6
## 19                                                                                                                                                                                    pvcaNormData.fact <- targets.fact 
## 20                                                                                                                                                                                               targets.pvcaFN <- NULL 
## 21                                                                                                                                                                                      hclustRawData.method <- average 
## 22                                                                                                                                                                                     hclustNormData.method <- average

The execution parameters were set as follows:

(loadQCNormfilt_todoparameters <- read.table(file.path(paramDir, "01-Load-QC-Norm-Filt_analysistodo.par"), header = FALSE, sep=";", as.is=TRUE))
##                                V1
## 1           readcelFiles <- TRUE 
## 2           loadRawData <- FALSE 
## 3          loadNormData <- FALSE 
## 4          normalizedata <- TRUE 
## 5                  QCraw <- TRUE 
## 6        boxplotRawData <- QCraw 
## 7            pcaRawData <- QCraw 
## 8        heatmapRawData <- QCraw 
## 9        arrayQMRawData <- QCraw 
## 10                QCnorm <- TRUE 
## 11     boxplotNormData <- QCnorm 
## 12         pcaNormData <- QCnorm 
## 13 pcaNormData.corrbatch <- TRUE 
## 14     heatmapNormData <- QCnorm 
## 15     arrayQMNormData <- QCnorm 
## 16          pvcaNormData <- TRUE 
## 17                SDplot <- TRUE 
## 18         filterByAnnot <- TRUE 
## 19          filterByVar <- FALSE 
## 20      removingofsamples <- TRUE
## 21        save_annot_all <- TRUE 
## 22       save_annot_filt <- TRUE

3.2.2 Load the targets file

Function read_targets reads the targets file specified in targetsFN (here targets.RSRCHR.STUDY.csv) that was generated in section 3.1.5. The cel file names contained in column FileName are set as rownames and factor variables specified by parameter targets.fact are converted to factors.

(targets <- read_targets(inputDir=dataDir, targetsFN=targetsFN, targets.fact=targets.fact))
##                       FileName Group ShortName      Colors Batch
## CTL.1 GSM3730337_A1-LT_1Wk.CEL   CTL     CTL.1 aquamarine3     1
## CTL.2 GSM3730338_A2-LT_1Wk.CEL   CTL     CTL.2 aquamarine3     2
## CTL.3 GSM3730339_A3-LT_1Wk.CEL   CTL     CTL.3 aquamarine3     1
## CTL.4 GSM3730340_A4-LT_1Wk.CEL   CTL     CTL.4 aquamarine3     2
## PD1.1 GSM3730341_B1-LT_1Wk.CEL   PD1     PD1.1   firebrick     1
## PD1.2 GSM3730342_B2-LT_1Wk.CEL   PD1     PD1.2   firebrick     2
## PD1.3 GSM3730343_B3-LT_1Wk.CEL   PD1     PD1.3   firebrick     1
## PD1.4 GSM3730344_B4-LT_1Wk.CEL   PD1     PD1.4   firebrick     2
## CMP.1 GSM3730345_E1-LT_1Wk.CEL   CMP     CMP.1 chartreuse3     1
## CMP.2 GSM3730346_E2-LT_1Wk.CEL   CMP     CMP.2 chartreuse3     2
## CMP.3 GSM3730347_E4-LT_1Wk.CEL   CMP     CMP.3 chartreuse3     1
## CMP.4 GSM3730348_E5-LT_1Wk.CEL   CMP     CMP.4 chartreuse3     2

The targets contains 12 rows and 5 columns. A summary of the variables is shown below:

summary(targets)
##    FileName         Group    ShortName            Colors          Batch
##  Length:12          CTL:4   Length:12          Length:12          1:6  
##  Class :character   PD1:4   Class :character   Class :character   2:6  
##  Mode  :character   CMP:4   Mode  :character   Mode  :character

3.2.3 Load raw data and create ExpressionSet object

Raw data can be read from the cel files or directly loaded from an existing Rda object ("rawData.Rda"). Function read_celfiles() reads the cel files listed in targets rownames (and in the same order) that are contained in the celDir directory and returns an object of class ExpressionSet.

if (readcelFiles){
    eset_raw <- read_celfiles(inputDir=celDir, targets=targets)
    } else {if (loadRawData) {load(file.path(dataDir, "rawData.Rda"))}}
## Loading required package: pd.clariom.s.mouse.ht
## Loading required package: Biostrings
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which, which.max, which.min
## Loading required package: S4Vectors
## Loading required package: stats4
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
## 
##     expand.grid
## Loading required package: IRanges
## Loading required package: XVector
## 
## Attaching package: 'Biostrings'
## The following object is masked from 'package:base':
## 
##     strsplit
## Loading required package: RSQLite
## Loading required package: oligoClasses
## Welcome to oligoClasses version 1.46.0
## Loading required package: oligo
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## ================================================================================
## Welcome to oligo version 1.48.0
## ================================================================================
## Loading required package: DBI
## Platform design info loaded.
## Reading in : /media/mferrer/Datos/Mireia/Scripts/Paquets_mireia/maUEB/vignettes/celfiles/GSM3730337_A1-LT_1Wk.CEL
## Reading in : /media/mferrer/Datos/Mireia/Scripts/Paquets_mireia/maUEB/vignettes/celfiles/GSM3730338_A2-LT_1Wk.CEL
## Reading in : /media/mferrer/Datos/Mireia/Scripts/Paquets_mireia/maUEB/vignettes/celfiles/GSM3730339_A3-LT_1Wk.CEL
## Reading in : /media/mferrer/Datos/Mireia/Scripts/Paquets_mireia/maUEB/vignettes/celfiles/GSM3730340_A4-LT_1Wk.CEL
## Reading in : /media/mferrer/Datos/Mireia/Scripts/Paquets_mireia/maUEB/vignettes/celfiles/GSM3730341_B1-LT_1Wk.CEL
## Reading in : /media/mferrer/Datos/Mireia/Scripts/Paquets_mireia/maUEB/vignettes/celfiles/GSM3730342_B2-LT_1Wk.CEL
## Reading in : /media/mferrer/Datos/Mireia/Scripts/Paquets_mireia/maUEB/vignettes/celfiles/GSM3730343_B3-LT_1Wk.CEL
## Reading in : /media/mferrer/Datos/Mireia/Scripts/Paquets_mireia/maUEB/vignettes/celfiles/GSM3730344_B4-LT_1Wk.CEL
## Reading in : /media/mferrer/Datos/Mireia/Scripts/Paquets_mireia/maUEB/vignettes/celfiles/GSM3730345_E1-LT_1Wk.CEL
## Reading in : /media/mferrer/Datos/Mireia/Scripts/Paquets_mireia/maUEB/vignettes/celfiles/GSM3730346_E2-LT_1Wk.CEL
## Reading in : /media/mferrer/Datos/Mireia/Scripts/Paquets_mireia/maUEB/vignettes/celfiles/GSM3730347_E4-LT_1Wk.CEL
## Reading in : /media/mferrer/Datos/Mireia/Scripts/Paquets_mireia/maUEB/vignettes/celfiles/GSM3730348_E5-LT_1Wk.CEL

The ExpressionSet object contains the following fields:

  • assayData: A matrix of expression values, where the rows represent probe sets (features) and columns represent samples. Row and column names must be unique, and consistent with row names of featureData and phenoData, respectively. The assay data can be retrieved with exprs(). Note that it can be subsetted though not reassigned. To reassign a new expression matrix one must create a new ExpressionSet.

  • phenoData: An AnnotatedDataFrame containing information about each sample as defined in the targets file. The number of rows in phenoData must match the number of columns in assayData. Row names of phenoData must match column names of the matrix in assayData. The phenoData can be retrieved or assigned with pData().

  • annotation: A character describing the platform on which the samples were assayed. This is often the name of a Bioconductor chip annotation package. Can be retrieved or assigned with annotation(). For ClariomS/D it is automatically filled by the platform chip annotation.

eset_raw
## ExpressionFeatureSet (storageMode: lockedEnvironment)
## assayData: 300304 features, 12 samples 
##   element names: exprs 
## protocolData
##   rowNames: CTL.1 CTL.2 ... CMP.4 (12 total)
##   varLabels: exprs dates
##   varMetadata: labelDescription channel
## phenoData
##   rowNames: CTL.1 CTL.2 ... CMP.4 (12 total)
##   varLabels: FileName Group ... Batch (5 total)
##   varMetadata: labelDescription channel
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation: pd.clariom.s.mouse.ht

In this example study, the expression matrix of raw data contains 300304 rows (features) and 12 columns (samples).

From now on, we will work with the ExpressionSet object eset_raw. This means that the phenotypic, annotation and expression data will be accessed from that object.

3.2.4 Quality control (QC) of raw data

Performs different exploratory analyses (Boxplot, PCA, Heatmap of sample distances and hierarchical clustering) to inspect graphically the quality of samples from raw data. As well, a report of QC is generated using the array quality metrics package.

3.2.4.1 Boxplot of raw data

Creates a boxplot of log2-transformed intensity values from raw data and saves it as pdf into the results directory.

if (boxplotRawData){
   qc_boxplot(data=eset_raw, group="Group", group.color="Colors", samplenames="ShortName", outputDir=resultsDir, label="RawData")
}

3.2.4.2 PCA of raw data

Performs a principal component analysis of raw data and creates different PCA plots with samples colored for each factor variable specified in 'pcaRawData.fact' from parameters. The PCA plot can be performed in 2 and/or 3 dimensions (representing the first 2 or 3 principal components) by setting the dim parameter. The loads (percentage of variance) cumulated for each of the principal components are returned.

if (pcaRawData){
    loadsPCAraw <- qc_pca1(data=exprs(eset_raw), scale=pcaRawData.scale, pca2D_factors=pcaRawData.fact, targets=targets, col.group="Colors", colorlist=colorlist, names=targets$ShortName, outputDir=resultsDir, label="RawData")
}

#si el volem en 3 dimensions:
if (pcaRawData){
    loadsPCAraw <- qc_pca1(data=exprs(eset_raw), pca3D=TRUE, scale=pcaRawData.scale, dim=c(1,2,3), pca3D_factors="Group", targets=targets, col.group="Colors", colorlist=colorlist, names=targets$ShortName, outputDir=resultsDir, label="RawData")
}

3.2.4.3 Heatmap of sample distances and hierarchical clustering of raw data

if (heatmapRawData){
    qc_hc(data=exprs(eset_raw), hclust.method=hclustRawData.method, names=targets$ShortName, cexRow = 0.6, cexCol = 0.6, rect=TRUE, numclusters=2, outputDir=resultsDir, label="RawData")
}

## png 
##   2

3.2.4.4 ArrayQualityMetrics of raw data

#Note: Set 'intgroup' according to factors to be colored in heatmap. Other plots will be colored according to only the first factor
#required for Affymetrix QC
if (arrayQMRawData) {
    require(arrayQualityMetrics) 
    arrayQualityMetrics(eset_raw, outdir = file.path(resultsDir, "QCDir.Raw"), force=TRUE, intgroup=targets.fact)
}

3.2.5 QC of raw data in one step

All the QC plots performed above can be performed in one step with function qc_all:

qc_all(data=eset_raw, group="Group", group.color="Colors", samplenames="ShortName", factors=pcaRawData.fact, pca_scale=pcaRawData.scale, colorlist=colorlist, hc.method=hclustRawData.method, label="RawData", outputDir=resultsDir, summaryFN=resultsSummFN, doboxplot=TRUE, dopca=TRUE, dopvca=FALSE, dohc=TRUE, doarrayQMreport=FALSE)

3.2.6 Normalization

Data can be normalized using normalization() function or directly loaded from an Rda object (normData.Rda) previously generated. The normalization() function normalizes raw data using one of the available methods (currently: rma method from oligo package). The RMA method allows background subtraction, quantile normalization and summarization (via median-polish) (ref). It returns a FeatureSet of normalized data and saves the normalized expression values in a datasheet (csv and/or xls file format) in the results folder.

if (loadNormData) {
    load(file.path(dataDir, "normData.Rda"))
} else if (normalizedata) {
    eset_norm <- normalization(data=eset_raw, norm.method="rma", annotPkg=annotPackage, outputFN="Normalized.All", outputDir=resultsDir)
}
## Background correcting
## Normalizing
## Calculating Expression

3.2.7 Save annotations of summarized probes

Constructs an aafTable object given a set of probe ids using aafTableAnn function from annaffy package. The dataframe with annotations is returned as an aafTable object and it also saves the annotation for all summarized probes as csv and html files. Note: in Exon studies, where probes are summarized at the probeset level but not transcript level, the probe annotations cannot be recovered using this function.

#no se pq dona error, nomes passa quan s'executa com a funcio sense haver previament carregat els paquets (?)
require(annotPackage, character.only=TRUE)
require(annaffy)
if (save_annot_all){normData_annot <- save_annotations(data=rownames(eset_norm), annotPkg=annotPackage, outputFN="Annotations.AllGenes", saveHTML=TRUE, title="Annotations for all genes", outputDir=resultsDir)}

3.2.8 Quality control of normalized data

Here, the same functions used for quality control of raw data can be used to assess the quality of data after normalization. As an extra analysis, a PVCA (principal variance component analysis) can also be performed to further assess the source of batch effects in normalize data.

3.2.8.1 Boxplot of normalized data

Creates a boxplot of the normalized data (already in log2 scale) and saves it as pdf into the results directory.

if (boxplotNormData){
   qc_boxplot(data=eset_norm, group="Group", col="Colors", names="ShortName", outputDir=resultsDir, label="NormData")
}
## Warning in .local(x, ...): Argument 'which' ignored (not meaningful for
## ExpressionSet)

## Warning in .local(x, ...): Argument 'which' ignored (not meaningful for
## ExpressionSet)

3.2.8.2 PCA of normalized data

Performs a principal component analysis of normalized data and creates different PCA plots with samples colored for each factor variable specified in 'pcaNormData.fact' from parameters. The loads (percentage of variance) cumulated for each of the principal components are returned.

if (pcaNormData){
    loadsPCAnorm <- qc_pca1(data=exprs(eset_norm), scale=pcaNormData.scale, pca2D_factors=pcaNormData.fact, targets=targets, col.group="Colors", colorlist=colorlist, names=targets$ShortName, outputDir=resultsDir, label="NormData")
}

If there are batch effects, the PCA can be repeated removing the batch effects using the removeBatchEffect function from limma package. Here this step is performed only for illustration purposes, since the Batch variable is artificial. We use the same function as for the PCA but specify batchRemove=TRUE and the name of Batch variable in batchFactors parameter.

if (pcaNormData.corrbatch){
    loadsPCAnorm.corrbatch <- qc_pca1(exprs(eset_norm), scale=pcaNormData.scale, pca2D_factors=pcaNormData.fact, targets=targets, col.group="Colors", colorlist=colorlist, names=targets$ShortName, outputDir=resultsDir, label="NormData", batchRemove=TRUE, batchFactors="Batch", size = 1.5, glineas = 0.25)
}

3.2.8.3 Heatmap of sample distances and hierarchical clustering of normalized data

if (heatmapNormData){
    qc_hc(data=exprs(eset_norm), hclust.method=hclustNormData.method, names=targets$ShortName, cexRow = 0.6, cexCol = 0.6, rect=TRUE, numclusters=2, outputDir=resultsDir, label="NormData")
}

## png 
##   2

3.2.8.4 ArrayQualityMetrics of normalized data

#Note: Set 'intgroup' according to factors to be colored in heatmap. Other plots will be colored according to only the first factor
require(arrayQualityMetrics) #required for Affymetrix QC
if (arrayQMNormData) arrayQualityMetrics(eset_norm, outdir = file.path(resultsDir, "QCDir.Norm"), force=TRUE, intgroup=targets.fact)

3.2.8.5 PVCA of normalized data

The PVCA approach can be used as a screening tool to determine which sources of variability (biological, technical or other) are most prominent in a given microarray data set.

if (pvcaNormData){
    qc_pvca(data=eset_norm, factors=pvcaNormData.fact, targetsPVCA=targets.pvcaFN, pct_threshold=pct_threshold, label="NormData", outputDir=resultsDir, summaryFN=resultsSummFN)
}

## $dat
##            [,1]      [,2]       [,3]      [,4]
## [1,] 0.04757598 0.2911197 0.01818642 0.6431179
## 
## $label
## [1] "Group:Batch" "Group"       "Batch"       "resid"

3.2.9 QC of normalized data in one step

All the QC plots performed above can be performed in one step with function qc_all:

qc_all(data=eset_norm, group="Group", group.color="Colors", samplenames="ShortName", factors=pcaNormData.fact, factorspvca=pvcaNormData.fact, pca_scale=pcaNormData.scale, colorlist=colorlist, hc.method=hclustNormData.method, label="NormData", outputDir=resultsDir, summaryFN=resultsSummFN, doboxplot=TRUE, dopca=TRUE, dopvca=TRUE, dohc=TRUE, doarrayQMreport=FALSE)

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular

## $dat
##            [,1]      [,2]       [,3]      [,4]
## [1,] 0.04757598 0.2911197 0.01818642 0.6431179
## 
## $label
## [1] "Group:Batch" "Group"       "Batch"       "resid"

3.2.10 Sample exclusion

The Quality Control may reveal some outlier samples that should be removed before proceeding with the differential expression analysis. Here, sample CMP.2 appears as a putative outlier both in the hierarchical clustering and PCA analysis. Therefore, this sample will be discarded from the analysis.

To remove samples, the whole ExpressionSet object can be subsetted as follows:

samplestoremove <- "CMP.2"
eset_raw.f <- eset_raw[,-which(colnames(eset_raw)%in%samplestoremove)]
dim(eset_raw)
## Features  Samples 
##   300304       12
dim(eset_raw.f)
## Features  Samples 
##   300304       11

Note that the phenoData contained in that object has also been subsetted:

pData(eset_raw.f)
##                       FileName Group ShortName      Colors Batch
## CTL.1 GSM3730337_A1-LT_1Wk.CEL   CTL     CTL.1 aquamarine3     1
## CTL.2 GSM3730338_A2-LT_1Wk.CEL   CTL     CTL.2 aquamarine3     2
## CTL.3 GSM3730339_A3-LT_1Wk.CEL   CTL     CTL.3 aquamarine3     1
## CTL.4 GSM3730340_A4-LT_1Wk.CEL   CTL     CTL.4 aquamarine3     2
## PD1.1 GSM3730341_B1-LT_1Wk.CEL   PD1     PD1.1   firebrick     1
## PD1.2 GSM3730342_B2-LT_1Wk.CEL   PD1     PD1.2   firebrick     2
## PD1.3 GSM3730343_B3-LT_1Wk.CEL   PD1     PD1.3   firebrick     1
## PD1.4 GSM3730344_B4-LT_1Wk.CEL   PD1     PD1.4   firebrick     2
## CMP.1 GSM3730345_E1-LT_1Wk.CEL   CMP     CMP.1 chartreuse3     1
## CMP.3 GSM3730347_E4-LT_1Wk.CEL   CMP     CMP.3 chartreuse3     1
## CMP.4 GSM3730348_E5-LT_1Wk.CEL   CMP     CMP.4 chartreuse3     2
targets.f <- pData(eset_raw.f)

After sample removal, the whole QC and normalization process should be repeated without those samples.

Hence, data is normalized with the new dataset (here we don't change the outputFN parameter so that files Normalized.All.xls and Normalized.All.csv are overwritten):

eset_norm.f <- normalization(data=eset_raw.f, norm.method="rma", annotPkg=annotPackage, outputFN="Normalized.All", outputDir=resultsDir)
## Background correcting
## Normalizing
## Calculating Expression

The QC is performed again with the normalized data after sample exclusion. We will indicate that the plots correspond to the dataset after sample exclusion by adding .f in label parameter:

qc_all(data=eset_norm.f, group="Group", group.color="Colors", samplenames="ShortName", factors=pcaNormData.fact, factorspvca=pvcaNormData.fact, pca_scale=pcaNormData.scale, colorlist=colorlist, hc.method=hclustNormData.method, label="NormData.f", outputDir=resultsDir, summaryFN=resultsSummFN, doboxplot=TRUE, dopca=TRUE, dopvca=FALSE, dohc=TRUE, doarrayQMreport=FALSE)

If there were batch effects, we should also repeat the PCA removing batch effects with the new dataset:

loadsPCAnorm.f.corrbatch <- qc_pca1(exprs(eset_norm.f), scale=pcaNormData.scale, pca2D_factors=pcaNormData.fact, targets=targets.f, col.group="Colors", colorlist=colorlist, names=targets.f$ShortName, outputDir=resultsDir, label="NormData.f", batchRemove=TRUE, batchFactors="Batch", size = 1.5, glineas = 0.25)

3.2.11 Filtering

Some filtering can be performed prior to differential expression analysis to remove the affymetrix control probes, as well as probesets with missing or duplicate EntrezID (annotation-based filtering); and/or to remove probesets with low variance (variance-based filtering). These options are specified in the following parameters: filtByAnnot and filtByVar, respectively. The resulting object is named eset_filt.

3.2.11.1 SD plot

Function sdplot() can be used to plot the standard deviations of all probesets in the array.

if (SDplot){
    sdplot(data=exprs(eset_norm.f), var_thr=filterByVar.thr, label="NormData", outputDir=resultsDir)
}
## png 
##   2

3.2.11.2 Filtering of normalized data

eset_filtered <- filtering(data=eset_norm.f, outputFN="Normalized.Filtered", outputDir=resultsDir, summaryFN=resultsSummFN,
                           feature.exclude="^AFFX",require.entrez=filterByAnnot, remove.dupEntrez = filterByAnnot,
                           var.filter=filterByVar, var.func = filterByVar.fun, var.cutoff = filterByVar.thr, filterByQuantile=TRUE,
                           require.GOBP = FALSE, require.GOCC = FALSE, require.GOMF = FALSE)
## 
## 
dim(eset_filtered)
## Features  Samples 
##    20296       11

3.2.11.3 Save annotations of filtered data

#no se pq dona error, (Error: no such table: go.go_term) nomes passa quan s'executa com a funcio sense haver previament carregat els paquets (?)
require(annotPackage, character.only=TRUE)
require(annaffy)
## Loading required package: annaffy
## Loading required package: GO.db
## Loading required package: KEGG.db
## 
## KEGG.db contains mappings based on older data because the original
##   resource was removed from the the public domain before the most
##   recent update was produced. This package should now be considered
##   deprecated and future versions of Bioconductor may not have it
##   available.  Users who want more current data are encouraged to look
##   at the KEGGREST or reactome.db packages
if (save_annot_filt){normData.filt_annot <- save_annotations(data=rownames(eset_filtered), annotPkg=annotPackage, outputFN="Annotations.Filtered", saveHTML=TRUE, title="Annotations for filtered genes", outputDir=resultsDir)}

3.2.12 Save data from this block (normData.Rda object)

# When you save your data, use save(MyObject, file = "MyObject.RData", version = 2) to maintain back-compatibility and avoid the warning. (otherwise, others using R < 3.5.0 won't be able to load your saved files.)

if (!is.null(samplestoremove)){
    if (normalizedata) {
            save(eset_raw, eset_norm, eset_raw.f, eset_norm.f, eset_filtered, targets, targets.f, file=file.path(dataDir,"normData.Rda"), version=2)
        } else {
            save(eset_raw, eset_raw.f, targets, targets.f, file=file.path(dataDir,"rawData.Rda"), version=2)
        }
    } else {
          if (normalizedata) {
            save(eset_raw, eset_norm, eset_filtered, targets, file=file.path(dataDir,"normData.Rda"), version=2)
        } else {
            save(eset_raw, targets, file=file.path(dataDir,"rawData.Rda"), version=2)
        }
        }

3.3 Block 02-Differential Expression Analysis (DEA)

3.3.1 Background

The analysis to select differentially expressed genes will be based on adjusting a linear model with empirical Bayes moderation of the variance. This is a technique specifically developed for microarray data analysis by Gordon K Smyth @smyth2004. Linear modeling is the same as ordinary analysis of variance or multiple regression except that a model is fitted for every gene. Of note, limma estimates the gene-specific variance using information from all samples in all groups, even if not all groups participate in the pairwise comparisons. This is necessary in 'omics studies with limited replication, and outweighs any supposed disadvantage of differences in variances between groups. For statistical analysis and assessing differential expression, limma uses an empirical Bayes method to moderate the standard errors of the estimated log-fold changes. This results in more stable inference and improved power, especially for experiments with small numbers of arrays.

The main purpose of fitting a linear model to the data is to estimate the variability in the data, hence the systematic part needs to be modelled so it can be distinguished from random variation. The model is specified by the design matrix. Each row of the design matrix corresponds to an array in your experiment and each column corresponds to a coefficient that is used to describe the RNA sources in your experiment. Then, a contrasts step is performed so that you can take the initial coefficients and compare them in as many ways as you want to answer any questions you might have, regardless of how many or how few these might be. (ref: limma user's guide)

General recommendations:

  • Since limma estimates the gene-specific variance using information from all samples in all groups, the design matrix should be constructed with all the samples (arrays) of the experiment, even if not all groups participate in the pairwise comparisons. However, some exceptions where subsetting could be reasonable could be when two groups are very different (eg. different tissues)
  • The covariates included in the linear model should not be correlated (eg. if the Group variable already integrates the information of Sex, no additional Sex covariate should be added to the model). Adding correlated variables may cause the design matrix to be unranked.
  • If batch effects were detected during Quality Control, the Batch variable should be added to the model as a systematic blocking factor
  • When samples are derived from the same Subject, this could be taken into account as a random or systematic effect. In general, with well balanced designs, including the Subject as a systematic blocking factor is recommended (paired analysis). For unbalanced designs or to make comparisons both within and between subjects, it is necessary to treat Subject as a random effect. This can be done in limma using the duplicateCorrelation function. (ref. limma user's guide chapter 9.7)

References on how to build design matrices:

3.3.2 Load the parameters for this block and set directories

#Parameters
source(file.path(paramDir, "02-DEA.par"))
#Execution parameters
source(file.path(paramDir, "02-DEA_analysistodo.par"))

For this example study, parameters were set as follows:

(loadDEA_parameters <- read.table(file.path(paramDir, "02-DEA.par"), header = FALSE, sep=";", as.is=TRUE))
##                                                   V1
## 1                                 dataDirN <- dades 
## 2                            resultsDirN <- results 
## 3           resultsSummFN <- ResultsSummary_DEA.txt 
## 4                       inputDEARda <- normData.Rda 
## 5                    eset_analysis <- eset_filtered 
## 6  annotPackage <- clariomsmousetranscriptcluster.db
## 7                          design.mainfact <- Group 
## 8                            design.cofact <- Batch 
## 9                             blocking.fact <- NULL 
## 10                               design.num <- NULL 
## 11                          design.num.targ <- NULL 
## 12             contrastsv <- c(PD1vsCTL = PD1 - CTL,
## 13                             CMPvsCTL = CMP - CTL,
## 14                            PD1vsCMP = PD1 - CMP) 
## 15                       ncomp <- length(contrastsv)
## 16       toptable_padjust_method <- c(none, BH, BH) 
## 17                       volc_logFC <- c(0.5, 1, 1) 
## 18    volc_pval <- c(P.Value, adj.P.Val, adj.P.Val) 
## 19                    volc_pval.thr <- rep(0.05, 3) 
## 20                        volc_x0 <- rep(-3, ncomp) 
## 21                        volc_x1 <- rep(+3, ncomp) 
## 22                         volc_y0 <- rep(0, ncomp) 
## 23                         volc_y1 <- rep(5, ncomp)

The execution parameters were set as follows:

(loadDEA_todoparameters <- read.table(file.path(paramDir, "02-DEA_analysistodo.par"), header = FALSE, sep=";", as.is=TRUE))
##                          V1
## 1 ReportHTMLTopTab <- TRUE 
## 2         Volcanos <- TRUE

Since directories could change from block to block, we set them again for this block:

dataDir <- file.path(workingDir, dataDirN)
resultsDir <- file.path(workingDir, resultsDirN)

3.3.3 Data for DEA

The Rda file generated in Block 01-Load-QC-Norm-Filt will be used as input for DEA:

load(file.path(dataDir, inputDEARda)) 

We assign the ExpressionSet object to be used for the analysis to an R object (it is specified in the eset_analysis parameter as a string)

eset_analysis <- eval(parse(text=eset_analysis))

3.3.4 Building the linear model

3.3.4.1 Design matrix

The design matrix is created from the phenotypic data, which can be retrieved from the ExpressionSet object using pData(eset_analysis) or from the targets object. The safer way is to work with the ExpressionSet object, instead of using the expression data and targets file separately. Note that it constructs the design matrix based only on targets sample name and then the fit is done based on the indexed positions.

Function dea_lmdesign() can be used to construct a design matrix. The main factor representing the Group of treatment should be provided in group parameter, and the covariates or systematic effects to take into account will be introduced in the covariates parameter. The design matrix will be constructed according to the formula ~ 0 + group + covariates. This design is the most optimal and flexible to enable different comparisons specified by the contrasts matrix. However, alternative designs can also be created by providing a formula to the fmla parameter.

In addition, if random effects are to be taken into account, this will be specified in blocking.fact parameter and taken into account via duplicateCorrelation() function during the model fit (see next section).

In this example, the artificially created Batch factor will be added to the model for illustration purposes only.

#parameters specified in file 02-DEA.par:
design.mainfact
## [1] "Group"
design.cofact
## [1] "Batch"
blocking.fact
## NULL
(design <- dea_lmdesign(targets=pData(eset_analysis), sampleNames="ShortName", data=eset_analysis,
                     group=design.mainfact, cov.fact=design.cofact, cov.num=design.num, fmla=NULL, 
                     summaryFN=resultsSummFN, outputDir=resultsDir))
##       CTL PD1 CMP Batch2
## CTL.1   1   0   0      0
## CTL.2   1   0   0      1
## CTL.3   1   0   0      0
## CTL.4   1   0   0      1
## PD1.1   0   1   0      0
## PD1.2   0   1   0      1
## PD1.3   0   1   0      0
## PD1.4   0   1   0      1
## CMP.1   0   0   1      0
## CMP.3   0   0   1      0
## CMP.4   0   0   1      1
## attr(,"assign")
## [1] 1 1 1 2
## attr(,"contrasts")
## attr(,"contrasts")$Group
## [1] "contr.treatment"
## 
## attr(,"contrasts")$Batch
## [1] "contr.treatment"

3.3.4.2 Linear model fit

Function dea_lmfit() can be used to fit the linear model to the data, as specified by the design matrix. It uses lmFit function from limma package. If random effects are to be taken into account, this should be specified in the block.cor parameter (i.e. blocking.fact parameter is not null), the correlation between measurements made on the same block will be estimated using duplicateCorrelation function from limma package and used to fit to the model. The calculated correlation consensus can be accessed from the object fit returned (fit$correlation).

fit <- dea_lmfit(data=eset_analysis, targets=pData(eset_analysis), design=design, block.cor=NULL, summaryFN=resultsSummFN, outputDir=resultsDir)

3.3.5 Compute the contrasts and moderated t-tests

The contrasts to be computed are defined in contrastsv parameter of 02-DEA.par file, where the contrasts names will be the coefficients to be extracted from the fit object. Function dea_compare() allows to construct the contrast matrix and compute estimated coefficients and standard errors for the given set of contrasts. In addition, it performs an empirical Bayes moderation of the standard errors. In the case where no moderation of the standard errors is to be performed (not recommended), this can be achieved by specifying moderated=FALSE parameter. The object returned fit.main.

fit.main <- dea_compare(fit=fit, contrasts=contrastsv, design=design, moderated=TRUE, summaryFN=resultsSummFN, outputDir=resultsDir)
names(fit.main)
##  [1] "coefficients"     "rank"             "assign"           "qr"              
##  [5] "df.residual"      "sigma"            "cov.coefficients" "stdev.unscaled"  
##  [9] "Amean"            "method"           "design"           "contrasts"       
## [13] "df.prior"         "s2.prior"         "var.prior"        "proportion"      
## [17] "s2.post"          "t"                "df.total"         "p.value"         
## [21] "lods"             "F"                "F.p.value"

3.3.6 TopTables

Assign name of comparisons (contrasts) to listofcoef variable:

(listofcoef <- colnames(fit.main))
## [1] "PD1vsCTL" "CMPvsCTL" "PD1vsCMP"

Function dea_toptab allows to obtain the toptables for the comparisons analyzed. To create html tables use html_report=TRUE (it will spend more time in execution).

#versio amb les html tables com venen per defecte (nomes mostra adj pval o pval)
listofcsv <- dea_toptab(listofcoef=listofcoef, fit.main=fit.main, eset=eset_analysis, padjust.method="fdr", 
                        html_report=ReportHTMLTopTab, 
                        html_ntop=500, html_group="Group", html_padjust_method = toptable_padjust_method,
                        outputDir=resultsDir)
head(listofcsv[[1]])
##                          ProbesetID Gene.Symbol EntrezID      logFC  AveExpr
## TC0200003103.mm.2 TC0200003103.mm.2       Mastl    67121 -0.7743359 6.578679
## TC0500000877.mm.2 TC0500000877.mm.2       Stbd1    52331  0.7601741 2.600744
## TC0200003303.mm.2 TC0200003303.mm.2        Lcn2    16819 -2.0423248 5.720048
## TC1700000991.mm.2 TC1700000991.mm.2      Zfp959   224893 -0.7598567 6.173209
## TC1100001853.mm.2 TC1100001853.mm.2        Sox9    20682  0.6264528 7.191829
## TC0300000694.mm.2 TC0300000694.mm.2       Fcrl1   229499  0.7608916 4.444347
##                           t      P.Value adj.P.Val           B    CTL.1
## TC0200003103.mm.2 -6.581913 3.475403e-05   0.50501  0.99628871 6.967894
## TC0500000877.mm.2  5.344704 2.147716e-04   0.50501 -0.01539208 1.991386
## TC0200003303.mm.2 -5.334615 2.181767e-04   0.50501 -0.02474273 6.888696
## TC1700000991.mm.2 -5.267342 2.423942e-04   0.50501 -0.08757404 6.569216
## TC1100001853.mm.2  5.258972 2.456002e-04   0.50501 -0.09544905 6.903068
## TC0300000694.mm.2  5.154835 2.894639e-04   0.50501 -0.19452520 4.339100
##                      CTL.2    CTL.3    CTL.4    PD1.1    PD1.2    PD1.3
## TC0200003103.mm.2 7.022167 7.054732 7.091516 6.087973 6.264468 6.286478
## TC0500000877.mm.2 1.965754 2.345005 2.278587 2.816463 2.955682 3.105219
## TC0200003303.mm.2 6.270977 5.658767 6.677155 4.459839 4.391253 3.440885
## TC1700000991.mm.2 6.696533 6.520853 6.525938 5.967611 5.956430 5.513731
## TC1100001853.mm.2 6.840660 6.837817 6.666565 7.452319 7.572522 7.359331
## TC0300000694.mm.2 4.023021 4.164470 3.923459 5.092097 4.900643 5.031838
##                      PD1.4    CMP.1    CMP.3    CMP.4
## TC0200003103.mm.2 6.400044 6.327643 6.289229 6.573320
## TC0500000877.mm.2 2.744064 2.703738 2.893336 2.808951
## TC0200003303.mm.2 5.034319 6.340505 5.728454 8.029680
## TC1700000991.mm.2 5.835340 6.136566 5.826947 6.356136
## TC1100001853.mm.2 7.369749 7.418443 7.327211 7.362429
## TC0300000694.mm.2 4.469038 4.475632 4.142661 4.325857
#versio amb les html tables modificades pq mostriin tots els estadistics. No l'executo pq triga molt. A millorar
listofcsv <- dea_toptab1(listofcoef=listofcoef, fit.main=fit.main, eset=eset_analysis, padjust.method="fdr", 
                        html_report=TRUE, #ReportHTMLTopTab, 
                        html_ntop=500, html_group="Group", 
                        outputDir=resultsDir)
names(listofcsv)
head(listofcsv[[1]])

3.3.7 Results summary

Function dea_summary_ngc can be used to summarize the results of differential expression. It returns a table with the number of features that are differentially expressed under different statistical thresholds. They are catalogued as Up or Down-regulated if their logFC is above or below a selected logFC threshold, respectively.

(numGenesChanged <- dea_summary_ngc(listofcoef=listofcoef, listofcsv=listofcsv, B_thr=c(0), Pval_thr=c(0.01,0.05,0.1), adjPval_thr=c(0.01,0.05,0.15,0.25), logFC_thr=0, outputDir=resultsDir))
##                  Stat PD1vsCTL CMPvsCTL PD1vsCMP
## 1               B0_Up        0      117       10
## 2             B0_Down        1       14       77
## 3    adj.P.Val0.01_Up        0        3        0
## 4  adj.P.Val0.01_Down        0        0        7
## 5    adj.P.Val0.05_Up        0       50        0
## 6  adj.P.Val0.05_Down        0        1       27
## 7    adj.P.Val0.15_Up        0      274       10
## 8  adj.P.Val0.15_Down        0       94       81
## 9    adj.P.Val0.25_Up        0      669       25
## 10 adj.P.Val0.25_Down        0      416      144
## 11     P.Value0.01_Up      183      583      147
## 12   P.Value0.01_Down      194      319      343
## 13     P.Value0.05_Up      820     1244      705
## 14   P.Value0.05_Down      952     1258      979
## 15      P.Value0.1_Up     1425     1815     1342
## 16    P.Value0.1_Down     1811     2190     1564

For an extended table involving different logFC thresholds, use function dea_summary_ngc_ext instead. The output will be a list with the summary tables obtained for each comparison.

ngc_list <- dea_summary_ngc_ext(listofcoef=listofcoef, listofcsv=listofcsv, B_thr.e=c(0), Pval_thr.e=c(0.01,0.05,0.1), adjPval_thr.e=c(0.01,0.05,0.15,0.25), logFC_thr.e=c(0,0.5,1,1.5,2), outputDir=resultsDir)
names(ngc_list)
## [1] "PD1vsCTL" "CMPvsCTL" "PD1vsCMP"
head(ngc_list[["PD1vsCTL"]])
##   stat_type stat_thr logFC_thr NumGenes NumGenesUp NumGenesDown
## 1         B     0.00       0.0        1          0            1
## 2         B     0.00       0.5        1          0            1
## 3         B     0.00       1.0        0          0            0
## 4         B     0.00       1.5        0          0            0
## 5         B     0.00       2.0        0          0            0
## 6 adj.P.Val     0.01       0.0        0          0            0

The two functions mentioned above are combined into a single function dea_summary_ngc1, where parameter extended specifies whether the extended tables are to be performed. The output is a list of two elements, the first containing the simple numGenesChanged dataframe and the second a list with the extended tables if performed.

numGenesChanged_all <- dea_summary_ngc1(listofcoef=listofcoef, listofcsv=listofcsv, B_thr=c(0), Pval_thr=c(0.01,0.05,0.1), adjPval_thr=c(0.01,0.05,0.15,0.25), logFC_thr=0, extended=TRUE, B_thr.e=c(0), Pval_thr.e=c(0.01,0.05,0.1), adjPval_thr.e=c(0.01,0.05,0.15,0.25), logFC_thr.e=c(0,0.5,1,1.5,2), outputDir=resultsDir)
names(numGenesChanged_all)
## [1] "numGenesChanged"          "numGenesChanged_extended"

3.3.8 Volcano plots

The results of the differential expression analysis obtained for each comparison can be visualized in a volcano plot using dea_volcanoplot() function. The volcano plot is a type of scatterplot that shows statistical significance (P value) versus magnitude of change (fold change). It enables quick visual identification of genes with large fold changes that are also statistically significant. The top significant genes are shown in purple according to the thresholds of logFC and pvalue specified in parameters volc_logFC, volc_pval and volc_pval.thr, with defaults set to an absolute logFoldChange above 1 and adjusted p-value below 0.05. In addition, Gene symbols are shown for the most significant genes, with a maximum of 20. The plots are saved as pdf into the outputDir specified and shown in the graphical device.

dea_volcanoplot(listofcsv=listofcsv, listofcoef=listofcoef, volc_logFC=rep(1,length(listofcoef)), volc_pval=c(rep("adj.P.Val", length(listofcoef))), volc_pval.thr=rep(0.05, length(listofcoef)), volc_x0=rep(-3, length(listofcoef)), volc_x1=rep(+3, length(listofcoef)), volc_y0=rep(0, length(listofcoef)), volc_y1=rep(10, length(listofcoef)), n=6, cols=2, outputDir=resultsDir, label="")
## NULL
## NULL
## NULL

## NULL
## NULL
## NULL
## [[1]]

## 
## [[2]]

## 
## [[3]]

3.3.9 All DEA in one function

All the steps performed in Block 02- DEA can be executed in a single step using dea_all1() function. The result is a list with objects corresponding to each step of the analysis:

dea.results <- dea_all(data=eset_analysis, targets=pData(eset_analysis), sampleNames="ShortName", group=design.mainfact, cov.fact=design.cofact, cov.num=design.num, fmla=NULL, block.cor=blocking.fact, contrastsv=contrastsv, moderated=TRUE, padjust.method="fdr", html_report=FALSE, html_ntop=500, html_group="Group", B_thr=c(0), Pval_thr=c(0.01,0.05,0.1), adjPval_thr=c(0.01,0.05,0.15,0.25), logFC_thr=0, extended=TRUE, B_thr.e=c(0), Pval_thr.e=c(0.01,0.05,0.1), adjPval_thr.e=c(0.01,0.05,0.15,0.25), logFC_thr.e=c(0,0.5,1,1.5,2), dovolcanoplot=TRUE, volc_logFC=rep(1,length(contrastsv)), volc_pval=c(rep("adj.P.Val", length(contrastsv))), volc_pval.thr=rep(0.05, length(contrastsv)), volc_x0=rep(-3, length(contrastsv)), volc_x1=rep(+3, length(contrastsv)), volc_y0=rep(0, length(contrastsv)), volc_y1=rep(10, length(contrastsv)), n=6, cols=2, label="", summaryFN=resultsSummFN, outputDir=resultsDir)
## NULL
## NULL
## NULL

## NULL
## NULL
## NULL
names(dea.results)
## [1] "fit"             "fit.main"        "listofcsv"       "numGenesChanged"
dea.results$fit.main$design
##       CTL PD1 CMP Batch2
## CTL.1   1   0   0      0
## CTL.2   1   0   0      1
## CTL.3   1   0   0      0
## CTL.4   1   0   0      1
## PD1.1   0   1   0      0
## PD1.2   0   1   0      1
## PD1.3   0   1   0      0
## PD1.4   0   1   0      1
## CMP.1   0   0   1      0
## CMP.3   0   0   1      0
## CMP.4   0   0   1      1
## attr(,"assign")
## [1] 1 1 1 2
## attr(,"contrasts")
## attr(,"contrasts")$Group
## [1] "contr.treatment"
## 
## attr(,"contrasts")$Batch
## [1] "contr.treatment"
dea.results$fit.main$contrasts
##         Contrasts
## Levels   PD1vsCTL CMPvsCTL PD1vsCMP
##   CTL          -1       -1        0
##   PD1           1        0        1
##   CMP           0        1       -1
##   Batch2        0        0        0
#extract objects for further analysis:
fit.main <- dea.results$fit.main
listofcsv <- dea.results$listofcsv
numGenesChanged_all <- dea.results$numGenesChanged
save(listofcoef, listofcsv, eset_norm, eset_analysis, targets, file=file.path(dataDir,"afterTopTabs.Rda"), version=2)

3.4 Block 03 Analysis of Multiple Comparisons

3.4.1 Background

It is interesting to look for common patterns of regulation between different experimental conditions. In order to find the degree of overlapped genes among different comparisons, a multiple comparisons analysis has been performed. Venn diagrams and heatmaps have been plotted for the different multiple comparisons.

3.4.2 Load the parameters for MC analysis and set directories

#Parameters
source(file.path(paramDir, "03-MC.par"))
#Execution parameters
source(file.path(paramDir, "03-MC_analysistodo.par"))

For this example study, parameters were set as follows:

(loadMC_parameters <- read.table(file.path(paramDir, "03-MC.par"), header = FALSE, sep=";", as.is=TRUE))
##                                                                                          V1
## 1                                                                        dataDirN <- dades 
## 2                                                                   resultsDirN <- results 
## 3                                                   resultsSummFN <- ResultsSummary_MC.txt 
## 4                                                           inputMCRda <- afterTopTabs.Rda 
## 5                                                   venn_comparNames <- c(EffectTreatment) 
## 6                                                                 venn_compar <- list(1:3) 
## 7                                                                venn_pval <- c(adj.P.Val) 
## 8                                                                 venn_pval.thr <- c(0.05) 
## 9                                                                       venn_logFC <- c(1) 
## 10                                                               venn_include <- list(abs) 
## 11                                                           venn_pos = list(c(0, 0, 180)) 
## 12                                                                                         
## 13                                                                    colFeat = Gene.Symbol
## 14                                                                      venn_FC_col = logFC
## 15                                                                     venn_FC.thr <- c(1) 
## 16                                         hm_comparNames <- c(EffectTreatment, EffectPD1) 
## 17                                                              hm_compar <-  list(1:3, 1) 
## 18                                                    hm_groupexclude <- list(c(), c(CMP)) 
## 19                                                        hm_pval <- c(adj.P.Val, P.Value) 
## 20                                                            hm_pval.thr <- c(0.05, 0.05) 
## 21                                                                        hm_logFC <- logFC
## 22                                                                 hm_logFC.thr <- c(1, 1) 
## 23                            hm_palette <- colorRampPalette(c(blue, white, red))(n = 199) 
## 24                                                                     hm_clustCol <- TRUE 
## 25                                                           hm_clustCol.dist <- euclidian 
## 26                                                          hm_clustCol.method <- complete 
## 27                                                                 hm_clustRow.cor <- TRUE 
## 28                                                                    batcheffect <- FALSE 
## 29                                                                   batchcolName <- Batch 
## 30                                                                    batchcolName2 <- NULL
## 31                                                                  batchNumcolName <- NULL
## 32                                                                featureCol <- Gene.Symbol
(loadMC_todoparameters <- read.table(file.path(paramDir, "03-MC_analysistodo.par"), header = FALSE, sep=";", as.is=TRUE))
##                                   V1
## 1                Venn_plots <- TRUE 
## 2 Venn_sharedElems_extended <- TRUE 
## 3             Heatmap_plots <- TRUE 
## 4      hm_plots_interactive <- TRUE

Since directories could change from block to block, we set them again for this block:

dataDir <- file.path(workingDir, dataDirN)
resultsDir <- file.path(workingDir, resultsDirN)
rdaDir <- dataDir

3.4.3 Data for MC

The Rda file generated in Block 02-Differential Expression Analysis (DEA) will be used as input for MC:

load(file.path(dataDir, inputMCRda))

3.4.4 MC Venn Upset

Function mc_venn_upset() can be used to perform Venn diagrams and/or Upset plots for the comparisons analyzed.

listsharedelems <- lapply(seq_along(venn_comparNames), function(v) {
  setwd(resultsDir)
  namescomp <- names(listofcsv)[venn_compar[[v]]]
  listsharedelems <- mc_venn_upset(listofcsv=listofcsv, namescomp=namescomp, label = venn_comparNames[v], colFeat = colFeat[v], 
                            colPval = venn_pval[v], pval = venn_pval.thr[v], colFC=venn_FC_col[v], FC=venn_FC.thr[v], include=venn_include[v], pltR = TRUE, 
                            pltPdf = TRUE, pltPng=FALSE, venn = TRUE, eul = FALSE, saveTables = TRUE, upsetPlot=FALSE,
                            saveTables_extended=Venn_sharedElems_extended,
                            colors = rainbow(length(namescomp)), trans = 0.5, 
                            cex1 = 1, rotation=0, position=venn_pos[[v]], cex2 = 1, resultsSummFN=resultsSummFN, outputDir=resultsDir)
  setwd(workingDir)
  return(listsharedelems)
})

names(listsharedelems) <- venn_comparNames
knitr::include_graphics(file.path(resultsDir, "VennDiagram.abs.EffectTreatment.adj.P.Val0.05.logFC1.pdf"))

3.4.5 MC Heatmap plots

Function mc_hm() can be used to perform Heatmap plots for the comparisons analyzed.

mc_hm(listofcsv=listofcsv, hm_dataset=listofcsv[[1]], targets=targets.f, featureCol=featureCol, hm_comparNames=hm_comparNames, hm_compar=hm_compar, hm_groupexclude=hm_groupexclude, hm_pval=hm_pval, hm_pval.thr=hm_pval.thr, hm_logFC=hm_logFC, hm_logFC.thr=hm_logFC.thr, hm_palette=hm_palette, hm_clustCol=hm_clustCol, hm_clustCol.dist=hm_clustCol.dist, hm_clustCol.method=hm_clustCol.method, hm_clustRow.cor=hm_clustRow.cor, batcheffect=batcheffect, batchcolName=batchcolName, batchcolName2=batchcolName2, batchNumcolName=batchNumcolName, hm_plots_interactive=TRUE, resultsSummFN=resultsSummFN, outputDir=resultsDir)

## $EffectTreatment
## 
## $EffectPD1

3.5 Block 04 A-Analysis of Biological Significance (ABS-GSEA)

3.5.1 Background

The analysis of biological significance has been based on gene set enrichment analysis (GSEA) @Subramanian2005 on different annotation databases. The analysis has been performed over two annotation databases: the "Gene Ontology" (GO) and the Reactome Pathway Knowledge base @reactome:2018. The goal of this analysis is to perform one of the available statistical tests to determine whether a given Gene Set, usually a particular category of the GO or pathway in Reactome, is over-represented in the list of selected genes (the sample) with respect (i.e. compared) to a reference set (the population) from where it has been selected. The reference set is usually taken to be all the genes analyzed in the study. In GSEA, all the genes analyzed in the study ( filtered genes) are ranked by log fold change an used in the analysis. GSEA aggregates the per gene statistics across genes within a gene set, therefore making it possible to detect situations where all genes in a predefined set change in a small but coordinated way @Subramanian2005.

3.5.2 Load the parameters for ABS-GSEA and set directories

#Parameters
source(file.path(paramDir, "04-ABS-GSEA.par"))
#Execution parameters
source(file.path(paramDir, "04-ABS-GSEA_analysistodo.par"))

For this example study, parameters were set as follows:

(loadABSGSEA_parameters <- read.table(file.path(paramDir, "04-ABS-GSEA.par"), header = FALSE, sep=";", as.is=TRUE))
##                                                                  V1
## 1                                                dataDirN <- dades 
## 2                                           resultsDirN <- results 
## 3                     resultsSummFN <- ResultsSummary_ABS-GSEA.txt 
## 4                              inputABSGSEARda <- afterTopTabs.Rda 
## 5                 annotPackage <- clariomsmousetranscriptcluster.db
## 6                                  organism_annot  <- org.Mm.eg.db 
## 7                                                organism <- mouse 
## 8                                                            label=
## 9                      resultsSummFN <- ResultsSummary_ABS-GSEA.txt
## 10                                          ranking_metric <- logFC
## 11                                          geneColname <- EntrezID
## 12                                             keyType <- ENTREZID 
## 13                                                        label <- 
## 14                                                readable <- FALSE
## 15                                                  saveGMT <- TRUE
## 16                                               saveTables <- TRUE
## 17                                                  saveRda <- TRUE
## 18                                        GO_comparisons <- c(1:3) 
## 19 GO_pvalcutoff <- list(rep(0.05, 3), rep(0.05, 3), rep(0.05, 3)) 
## 20    GO_pAdjustMethod <- list(rep(BH, 3), rep(BH, 3), rep(BH, 3)) 
## 21                                              GO_minSetSize <- 3 
## 22                                            GO_maxSetSize <- 500 
## 23                              Reac_comparisons <- GO_comparisons 
## 24                                 Reac_pvalcutoff <- rep(0.05, 3) 
## 25                                Reac_pAdjustMethod <- rep(BH, 3) 
## 26                                            Reac_minSetSize <- 3 
## 27                                          Reac_maxSetSize <- 500
(loadABSGSEA_todoparameters <- read.table(file.path(paramDir, "04-ABS-GSEA_analysistodo.par"), header = FALSE, sep=";", as.is=TRUE))
##                                             V1
## 1                          GOAnalysis <- TRUE 
## 2             GOAnalysis.unfilt <- GOAnalysis 
## 3               GOAnalysis.filt <- GOAnalysis 
## 4                             GOPlots <- TRUE 
## 5             GO_categories <- list(BP,CC,MF) 
## 6                     ReactomeAnalysis <- TRUE
## 7 ReactomeAnalysis.unfilt <- ReactomeAnalysis 
## 8   ReactomeAnalysis.filt <- ReactomeAnalysis 
## 9                        ReactomePlots <- TRUE

Since directories could change from block to block, we set them again for this block:

dataDir <- file.path(workingDir, dataDirN)
resultsDir <- file.path(workingDir, resultsDirN)
rdaDir <- dataDir
gmtDir <- resultsDir

3.5.3 Data for ABS-GSEA

The Rda file generated in Block 02-Differential Expression Analysis (DEA) will be used as input for GSEA:

load(file.path(dataDir, inputABSGSEARda))

3.5.4 GO unfilt GSEA

Function abs_gsea_GOunfilt() can be used to perform GO analysis for all the comparisons and store unfiltered results (no pvalue threshold) in lists. The necessary inputs are the toptables generated in the DEA analysis and as output, the analysis will give you lists of GO unfiltered results and a table with number of terms found at different pvalue thresholds for the different categories analyzed.

namescomp <- names(listofcsv)[GO_comparisons]
GSEA_GOresults.unfilt <- abs_gsea_GOunfilt(listofTopNamed=listofcsv, namescomp=namescomp, annotPackage=annotPackage, organism_annot= organism_annot, ranking_metric=ranking_metric, geneColname=geneColname, keyType=keyType, GO_minSetSize=GO_minSetSize, GO_maxSetSize=GO_maxSetSize, resultsSummFN=resultsSummFN, saveRda=saveRda, saveGMT=saveGMT, label=label, outputDir=resultsDir, rdaDir=rdaDir, gmtDir=gmtDir)

3.5.5 GO filt GSEA plots

Function abs_gsea_GOfiltplot() filters GO results according to the pvalue threshold set previously in parameters. It takes as input the lists of GO unfiltered results and pvalue thresholds. As an output, dataframes of GO results filtered by pvalue are obtained. The function also performs a GO significance analysis, giving some plots with the most enriched GO terms pathways.

GSEA_GOresults.filt <- abs_gsea_GOfiltplot(GSEAresGO=GSEA_GOresults.unfilt, GO_pvalcutoff=GO_pvalcutoff, GO_pAdjustMethod=GO_pAdjustMethod, saveTables=saveTables, GOPlots=GOPlots, label=label, resultsSummFN=resultsSummFN, outputDir=resultsDir)

3.5.6 Reactome pathways unfilt GSEA

Function abs_gsea_ReactomePAunfilt() can be used to perform Reactome analysis for all the comparisons and store unfiltered results (no pvalue threshold) in lists. The necessary inputs are the toptables generated in the DEA analysis and as output, the analysis will give you lists of Reactome unfiltered results and a table with number of terms found at different pvalue thresholds for the different categories analyzed.

namescomp <- names(listofcsv)[Reac_comparisons]
GSEA_ReactomePA.unfilt <- abs_gsea_ReactomePAunfilt(listofTopNamed=listofcsv, namescomp=namescomp, organism=organism, organism_annot=organism_annot, Reac_minSetSize=Reac_minSetSize, Reac_maxSetSize=Reac_maxSetSize, resultsSummFN=resultsSummFN, saveRda=saveRda, saveGMT=saveGMT, label=label, outputDir=resultsDir, rdaDir=rdaDir, gmtDir=gmtDir)

3.5.7 Reactome filt GSEA plots

Function abs_gsea_ReactomePAfiltplot() filters Reactome results according to the pvalue threshold set previously in parameters. It takes as input the lists of Reactome unfiltered results and pvalue thresholds. As an output, dataframes of Reactome results filtered by pvalue are obtained. The function also performs a Reactome significance analysis, giving some plots with the most enriched Reactome pathways.

abs_gsea_ReactomePAfiltplot(GSEAresReac=GSEA_ReactomePA.unfilt, Reac_pvalcutoff=Reac_pvalcutoff, Reac_pAdjustMethod=Reac_pAdjustMethod, saveTables=saveTables, ReacPlots=ReactomePlots, label=label, resultsSummFN=resultsSummFN, outputDir=resultsDir)

3.6 Block 04 B-Analysis of Biological Significance (ABS-ORA)

3.6.1 Background

An overrepresentation analysis (ORA) will be performed over GO-BP and Reactome Pathways databases to find the terms enriched in the lists of genes up-regulated/down-regulated in common between the comparisons analyzed.

3.6.2 Load the parameters for ABS-ORA and set directories

#Parameters
source(file.path(paramDir, "04-ABS-ORA.par"))
#Execution parameters
source(file.path(paramDir, "04-ABS-ORA_analysistodo.par"))

For this example study, parameters were set as follows:

(loadABSORA_parameters <- read.table(file.path(paramDir, "04-ABS-ORA.par"), header = FALSE, sep=";", as.is=TRUE))
##                                                                             V1
## 1                                                           dataDirN <- dades 
## 2                                                      resultsDirN <- results 
## 3                                 resultsSummFN <- ResultsSummary_ABS-ORA.txt 
## 4                                          inputABSORARda <- afterTopTabs.Rda 
## 5                          annotPackage <- clariomsmousetranscriptcluster.db  
## 6                                             organism_annot  <- org.Mm.eg.db 
## 7                                                           organism <- mouse 
## 8                                                                       label=
## 9                                  resultsSummFN <- ResultsSummary_ABS-ORA.txt
## 10                                                           universe <- NULL 
## 11                                                            sign <- c(1,-1) 
## 12                                                         keyType <- ENTREZID
## 13                                                    geneColname <- EntrezID 
## 14                                                   GO_categories <- list(BP)
## 15                                                          GO_minSetSize <- 3
## 16                                                        GO_maxSetSize <- 500
## 17                             selected.pval.go <- c(P.Value,P.Value,P.Value) 
## 18                                   selected.pvalthr.go <- c(0.05,0.05,0.05) 
## 19                                               selected.logFC.go <- c(1,1,1)
## 20                                                    col_logFC.go <- c(logFC)
## 21                                                     sign_logFC.go <- c(abs)
## 22                                                             GOPlots <- TRUE
## 23 GO_pvalcutoff <- rep(list(c(BP=0.15, CC=0.15, MF=0.15)),length(namescomp)) 
## 24    GO_pAdjustMethod <- rep(list(c(BP=BH, CC=BH, MF=BH)),length(namescomp)) 
## 25                                                                            
## 26                                                     col_entrez <- EntrezID 
## 27                                                       Reac_minSetSize <- 3 
## 28                                                     Reac_maxSetSize <- 500 
## 29                             Reac_pvalcutoff <- rep(0.15,length(namescomp)) 
## 30                            Reac_pAdjustMethod <- rep(BH,length(namescomp)) 
## 31                                                           ReacPlots <- TRUE
## 32                            selected.pval.reac <- c(P.Value,P.Value,P.Value)
## 33                                 selected.pvalthr.reac <- c(0.05,0.05,0.05) 
## 34                                             selected.logFC.reac <- c(1,1,1)
## 35                                                  col_logFC.reac <- c(logFC)
## 36                                                   sign_logFC.reac <- c(abs)
(loadABSORA_parameters <- read.table(file.path(paramDir, "04-ABS-ORA_analysistodo.par"), header = FALSE, sep=";", as.is=TRUE))
##                                              V1
## 1                              readable <- TRUE
## 2                               saveGMT <- TRUE
## 3                            saveTables <- TRUE
## 4                               saveRda <- TRUE
## 5                           GOAnalysis <- TRUE 
## 6              GOAnalysis.unfilt <- GOAnalysis 
## 7                GOAnalysis.filt <- GOAnalysis 
## 8                              GOPlots <- TRUE 
## 9              GO_categories <- list(BP,CC,MF) 
## 10                     ReactomeAnalysis <- TRUE
## 11 ReactomeAnalysis.unfilt <- ReactomeAnalysis 
## 12   ReactomeAnalysis.filt <- ReactomeAnalysis 
## 13                        ReactomePlots <- TRUE

Since directories could change from block to block, we set them again for this block:

dataDir <- file.path(workingDir, dataDirN)
resultsDir <- file.path(workingDir, resultsDirN)
rdaDir <- dataDir
gmtDir <- resultsDir

3.6.3 Data for ABS-ORA

The Rda file generated in Block 02-Differential Expression Analysis (DEA) will be used as input for ABS-ORA:

load(file.path(dataDir, inputABSORARda))

3.6.4 GO unfilt ORA

Function abs_ora_GOunfilt() can be used to perform an overrepresentation GO analysis for all the comparisons and store unfiltered results (no pvalue threshold) in lists. The necessary inputs are the toptables generated in the DEA analysis and as output, the analysis will give you lists of GO unfiltered results and a table with number of terms found at different pvalue thresholds for the different categories analyzed.

ORA_GOresults.unfilt <- abs_ora_GOunfilt(listofTopNamed=listofcsv, namescomp=namescomp, universe=universe, geneColname=geneColname, readable=readable, selected.pval.go=selected.pval.go, selected.pvalthr.go=selected.pvalthr.go, selected.logFC.go=selected.logFC.go, col_logFC.go=col_logFC.go, sign_logFC.go=sign_logFC.go, organism_annot=organism_annot, keyType=keyType, GO_categories=GO_categories, GO_minSetSize=GO_minSetSize, GO_maxSetSize=GO_maxSetSize, resultsSummFN=resultsSummFN, saveRda=saveRda, saveGMT=saveGMT, label=label, outputDir=resultsDir, rdaDir=rdaDir, gmtDir=gmtDir)

3.6.5 GO filt ORA plots

Function abs_ora_GOfiltplot() filters GO results according to the pvalue threshold set previously in parameters. It takes as input the lists of GO unfiltered results and pvalue thresholds. As an output, dataframes of GO results filtered by pvalue are obtained. The function also performs a GO significance analysis, giving some plots with the most enriched GO terms pathways.

ORA_GOresults.filt <- abs_ora_GOfiltplot(ORAresGO=ORA_GOresults.unfilt, sign=sign, GO_pvalcutoff=GO_pvalcutoff, GO_pAdjustMethod=GO_pAdjustMethod, saveTables=saveTables, GOPlots=GOPlots, label=label, resultsSummFN=resultsSummFN, outputDir=resultsDir)

3.6.6 Reactome unfilt ORA

Function abs_ora_ReactomePAunfilt() can be used to perform an overrepresentation Reactome analysis for all the comparisons and store unfiltered results (no pvalue threshold) in lists. The necessary inputs are the toptables generated in the DEA analysis and as output, the analysis will give you lists of Reactome unfiltered results and a table with number of terms found at different pvalue thresholds for the different categories analyzed.

ORA_ReactomePAresults.unfilt <- abs_ora_ReactomePAunfilt(listofTopNamed=listofcsv, namescomp=namescomp, universe=universe, col_entrez=col_entrez, readable=readable, selected.pval.reac=selected.pval.reac, selected.pvalthr.reac=selected.pvalthr.reac, selected.logFC.reac=selected.logFC.reac, col_logFC.reac=col_logFC.reac, sign_logFC.reac=sign_logFC.reac, organism=organism, organism_annot=organism_annot, keyType=keyType, Reac_minSetSize=Reac_minSetSize, Reac_maxSetSize=Reac_maxSetSize, resultsSummFN=resultsSummFN, saveRda=saveRda, saveGMT=saveGMT, label=label, outputDir=resultsDir, rdaDir=rdaDir, gmtDir=gmtDir)

3.6.7 Reactome filt ORA plots

Function abs_ora_ReactomePAfiltplot() filters Reactome results according to the pvalue threshold set previously in parameters. It takes as input the lists of Reactome unfiltered results and pvalue thresholds. As an output, dataframes of Reactome results filtered by pvalue are obtained. The function also performs a Reactome significance analysis, giving some plots with the most enriched Reactome pathways.

ORA_ReactomePAanalysis_filt <- abs_ora_ReactomePAfiltplot(ORAresReac=ORA_ReactomePAresults.unfilt, sign=sign, Reac_pvalcutoff=Reac_pvalcutoff, Reac_pAdjustMethod=Reac_pAdjustMethod, saveTables=saveTables, ReacPlots=ReacPlots, label=label, resultsSummFN=resultsSummFN, outputDir=resultsDir)

4 References